-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cuda virtual memory management and merge blocks #36189
Use cuda virtual memory management and merge blocks #36189
Conversation
… auto_growth_v2
auto result = | ||
paddle::platform::dynload::cuDeviceGet(&device, p.GetDeviceId()); | ||
PADDLE_ENFORCE_EQ( | ||
result, CUDA_SUCCESS, | ||
platform::errors::Fatal("Call CUDA API cuDeviceGet faild, return %d.", | ||
result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PADDLE_ENFORCE_CUDA_SUCCESS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感谢,已修改。
result, CUDA_SUCCESS, | ||
platform::errors::Fatal( | ||
"Call CUDA API cuDeviceGetAttribute faild, return %d.", result)); | ||
} catch (...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which case it may raise exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, plz add comments on that.
@@ -131,6 +131,20 @@ gpuError_t RecordedCudaMalloc(void **ptr, size_t size, int dev_id); | |||
//! CudaFree with recorded info | |||
void RecordedCudaFree(void *p, size_t size, int dev_id); | |||
|
|||
#ifdef PADDLE_WITH_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems not need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我不确定在不是CUDA的情况下CUDA_VERSION是否会被初始化为乱码,因此严谨了一些。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is #ifdef PADDLE_WITH_CUDA
at the beginning, so I think maybe it is duplicated.
__macro(cuInit); \ | ||
__macro(cuDriverGetVersion); \ | ||
__macro(cuGetErrorString); \ | ||
__macro(cuModuleLoadData); \ | ||
__macro(cuModuleGetFunction); \ | ||
__macro(cuModuleUnload); \ | ||
__macro(cuOccupancyMaxActiveBlocksPerMultiprocessor); \ | ||
__macro(cuLaunchKernel); \ | ||
__macro(cuCtxCreate); \ | ||
__macro(cuCtxGetCurrent); \ | ||
__macro(cuDeviceGetCount); \ | ||
__macro(cuDevicePrimaryCtxGetState); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are duplicated with APIs in #else, maybe #else is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里面有些API,只有cuda10.2以上才有,所以用宏控制了一下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can always define the APIs that exists in version < 10.2 and > 10.2 without macro.
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -641,6 +646,30 @@ class RecordedCudaMallocHelper { | |||
|
|||
uint64_t LimitSize() const { return limit_size_; } | |||
|
|||
#ifdef PADDLE_WITH_CUDA | |||
#if CUDA_VERSION >= 10020 | |||
CUresult cuMemCreate(CUmemGenericAllocationHandle *handle, size_t size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better name the member function like other functions, for example, "CreateMem". (Start with upper case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done,thx!
paddle::platform::CUDADeviceGuard guard(place.device); | ||
PADDLE_ENFORCE_CUDA_SUCCESS(cudaMemGetInfo(&actual_avail, &actual_total)); | ||
|
||
virtual_mem_size_ = (actual_total + granularity_ - 1) & ~(granularity_ - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wahy do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是为了申请一个虚拟地址空间,大小恰好等于GPU的显存大小。确保一次性开辟的虚拟地址空间够用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtual_mem_size_ = (actual_total + granularity_ - 1) & ~(granularity_ - 1);
已经改成使用函数调用方式
block--; | ||
auto pre = block; | ||
block++; | ||
block++; | ||
auto next = block; | ||
block--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot easily understand it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有pre、next还不好理解吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改成使用std::next和std::prev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Describe
使用NV Virtual Memory Management(VMM)机制新写Allocator(AutoGrowthV2)。
使用VMM后,可以对向CUDA申请的显存块进行合并(而老的AutoGrowth是不能够合并显存块的)。